This is an R Markdown Notebook
Deep Learning… In this lecture I would like to esplore with you the DeepLearning we can do with H2O package in R…
Used example from: (https://dzone.com/articles/anomaly-detection-with-deep-learning-in-r-with-h2o)[https://dzone.com/articles/anomaly-detection-with-deep-learning-in-r-with-h2o]
More reading: (https://dzone.com/articles/the-basics-of-deep-learning-how-to-apply-it-to-pre?fromrel=true)[https://dzone.com/articles/the-basics-of-deep-learning-how-to-apply-it-to-pre?fromrel=true]
And: (https://shiring.github.io/machine_learning/2017/05/01/fraud)[https://shiring.github.io/machine_learning/2017/05/01/fraud]
In this lecture we would explore the ‘technology’ on the sample and try to do this in 10 min lecture!
The ‘Thing’ will be working on your computer. In fact you will install another ‘computer’ inside your ‘computer’…
These instructions I got from: (http://h2o.ai/download/)[http://h2o.ai/download/]. Simply use ‘Install from R’ option
As you see this will actually start the ‘cluster’ on our machine! We will now can look on Localhost:54321 to see the ‘interface’ of our cluster…
As a trial let’s learn how to shut down the machine…
# we can shut down the 'machine' like this...
h2o.shutdown(prompt= FALSE)
In this case I have installed H2O Machine Learning Platform on my PC, but in the real world you may install it on more powerfull computer.
We will now run the demo from H2O. Our goal will be to teach system what is normal by using ECG dataset. After that we will use another dataset that will contain anomaly and use our Deep Learning model to detect that
First thing we will launch the machine again…
# to load the library
library(h2o)
# to initialize the 'machine'
h2o.init()
Then we will download the datasets from h2o.
# Import ECG train and test data into the H2O cluster
train_ecg <- h2o.importFile(
path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv",
header = FALSE,
sep = ",")
test_ecg <- h2o.importFile(
path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv",
header = FALSE,
sep = ",")
Personally I like to know what is in the data and how it look like!!! I will just use this link and put it into the browser. This will download the files with raw data. If I open this dataset it would not tell me much! I will plot this data in excel…
Train data set
Test data set
Or, I can also pull the data from h2o into R and make some 3D visualizations…
library(tidyverse)
# data frame matrix for training dataset
matrix_train <- train_ecg %>% as.data.frame() %>% as.matrix.data.frame()
# data frame matrix for test dataset
matrix_test <- test_ecg %>% as.data.frame() %>% as.matrix.data.frame()
from there we can see that the difference between both are in the three new rows 21-23
# using library plotly to plot 3D surface
library(plotly)
plot_ly(z = matrix_train, type = "surface")
Train Dataset with plotly
# using library plotly to plot 3D surface
plot_ly(z = matrix_test, type = "surface")
Test Dataset with plotly
Now, once we know how our data looks like we can start to do our Anomaly Model
# Train deep autoencoder learning model on "normal"
# training data, y ignored
anomaly_model <- h2o.deeplearning(
x = names(train_ecg),
training_frame = train_ecg,
activation = "Tanh",
autoencoder = TRUE,
hidden = c(50,20,50),
sparse = TRUE,
l1 = 1e-4,
epochs = 100)
Let’s use this model on our training dataset…
# computer error of the model
mod_error <- h2o.anomaly(anomaly_model, train_ecg)
get it as a plot and see that the value is very low
# visually see it
h2o.anomaly(anomaly_model, train_ecg) %>%
as.data.frame() %>% plot.ts(ylim = c(0, 2))
Once our model is made, we can use it to detect anomalies in our test dataset
# Compute reconstruction error with the Anomaly
# detection app (MSE between output and input layers)
recon_error <- h2o.anomaly(anomaly_model, test_ecg)
# Pull reconstruction error data into R and
# plot to find outliers (last 3 heartbeats)
df_recon_error <- as.data.frame(recon_error)
tail(df_recon_error, 9)
We can plot this as well
plot.ts(df_recon_error)
What it is telling us is that based on the new data we have anomaly in elements 21, 22, 23
Now we can obtain predictions, or physical values using our model. We should provide the model and test dataset
# Note: Testing = Reconstructing the test dataset
test_recon <- h2o.predict(anomaly_model, test_ecg)
head(test_recon)
In order to make things visible. Once again I ask excel to help… Here I simply write teh dataframe to csv and create graph in excel
# write to csv to use it in excel
test_recon %>% as.data.frame() %>%
write.csv("test_predicted.csv")
Predicted data set
Or we can visualize in 3D directly in R
# making a matrix dataframe
recon_matrix <- test_recon %>% as.data.frame() %>% as.matrix.data.frame()
# make 3D plot
plot_ly(z = recon_matrix, type = "surface")
Predicted Dataset with plotly
To use our model in our ShinyApp we will save it…
h2o.saveModel(anomaly_model, "C:/Users/fxtrams/Downloads/tmp/anomaly_model.bin")
h2o.download_pojo(anomaly_model, "C:/Users/fxtrams/Downloads/tmp", get_jar = TRUE)
And let’s not forget to switch off our cluster!
h2o.shutdown(prompt= FALSE)
In this example the Anomaly Detection model was able to output the anomaly in rows 21-23.
It learned on the pattern of many vectors and was able to distinguish the anomaly coming on new dataset
Practical use of this model can be to us function h2o.anomaly. In case the MSE value will be high - the anomaly is detected!
our next step will be to repeat the procedure but on our machine data.